Comparison of language modelling techniques for Russian and English

نویسندگان

  • Edward W. D. Whittaker
  • Philip C. Woodland
چکیده

In this paper the main differences between language modelling of Russian and English are examined. A Russian corpus and a comparable English corpus are described. The effects of high inflectionality in Russian and the relationship between the outof-vocabulary rate and vocabulary size are investigated. Standard word and class N -gram language modelling techniques are applied to the two corpora and perplexity results are reported. A novel approach to the modelling of inflected languages is proposed and its efficacy compared with the other techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Collocational Processing in Two Languages: A psycholinguistic comparison of monolinguals and bilinguals

With the renewed interest in the field of second language learning for the knowledge of collocating words, research findings in favour of holistic processing of formulaic language could support the idea that these language units facilitate efficient language processing. This study investigated the difference between processing of a first language (L1) and a second language (L2) of congruent col...

متن کامل

Mental Representations of Lyrical Prose

The article analyzes mental representations of Russian lyrical prose texts. The texts demonstrate collective memory engrams that are defined by cultural and historical legacy of the nation and authors’ creative world perception. In architectonics of a lyrical prose text, sense perception reveals itself in accumulated underlying meanings and wisdom conveyed by expressive means. The author’s inte...

متن کامل

Rhyming Compounds as Elements of a Language Game (In Russian and English Languages)

The article is devoted to the study of composite rhyming compounds as a means of word formation games. It explores the place of this category of words in the lexical system and peculiarities of their use in the Russian and English languages. Authors of the article represent compound words as a special lexical subgroup. On the specific publicistic material are revealed the peculiarities of compo...

متن کامل

Dictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application

The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...

متن کامل

Selection of Foreign Language Teaching Content in Russian Master of Laws (LLM) Graduate Programs

Master`s degree was integrated into the system of Russian Higher Education several decades ago, however, teaching foreign languages at this level still needs further analysis including the postgraduate law students training. The article investigates the principal components of foreign language teaching in Master of laws Graduate Programs (considering the case of the English language) on the bas...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998